AITopics | pure rl

Collaborating Authors

pure rl

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Combining RL and IL using a dynamic, performance-based modulation over learning signals and its application to local planning

Leiva, Francisco, Ruiz-del-Solar, Javier

arXiv.org Artificial IntelligenceMay-15-2024

This paper proposes a method to combine reinforcement learning (RL) and imitation learning (IL) using a dynamic, performance-based modulation over learning signals. The proposed method combines RL and behavioral cloning (IL), or corrective feedback in the action space (interactive IL/IIL), by dynamically weighting the losses to be optimized, taking into account the backpropagated gradients used to update the policy and the agent's estimated performance. In this manner, RL and IL/IIL losses are combined by equalizing their impact on the policy's updates, while modulating said impact such that IL signals are prioritized at the beginning of the learning process, and as the agent's performance improves, the RL signals become progressively more relevant, allowing for a smooth transition from pure IL/IIL to pure RL. The proposed method is used to learn local planning policies for mobile robots, synthesizing IL/IIL signals online by means of a scripted policy. An extensive evaluation of the application of the proposed method to this task is performed in simulations, and it is empirically shown that it outperforms pure RL in terms of sample efficiency (achieving the same level of performance in the training environment utilizing approximately 4 times less experiences), while consistently producing local planning policies with better performance metrics (achieving an average success rate of 0.959 in an evaluation environment, outperforming pure RL by 12.5% and pure IL by 13.9%). Furthermore, the obtained local planning policies are successfully deployed in the real world without performing any major fine tuning. The proposed method can extend existing RL algorithms, and is applicable to other problems for which generating IL/IIL signals online is feasible. A video summarizing some of the real world experiments that were conducted can be found in https://youtu.be/mZlaXn9WGzw.

agent, algorithm, robot, (16 more...)

arXiv.org Artificial Intelligence

2405.0976

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Spain > Galicia > Madrid (0.04)
Asia > Japan > Honshū > Kansai > Hyogo Prefecture > Kobe (0.04)

Genre: Research Report (0.82)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

How to fix reinforcement learning

#artificialintelligenceApr-20-2020, 00:27:58 GMT

"Value functions are a core component of [RL] systems. The main idea is to to construct a single function approximator V(s; θ) that estimates the long-term reward from any state s, using parameters θ. In this paper we introduce universal value function approximators (UVFAs) V(s, g; θ) that generalise not just over states s but also over goals g." Here is a rigorous, mathematical formulation of RL that treats goals (the high-level objective of the skill to be learned, which should yield good rewards) as a fundamental and necessary input rather than something to be discovered from just the reward signal. The agent is told what it's supposed to do, just as is done in zero-shot learning and actual human learning. It has been 3 years since this was published, and how many papers have cited it since?

agent, learning, reinforcement, (15 more...)

#artificialintelligence

Genre: Research Report (0.69)

Industry:

Education (0.94)
Leisure & Entertainment > Games (0.72)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

State-Only Imitation Learning for Dexterous Manipulation

Radosavovic, Ilija, Wang, Xiaolong, Pinto, Lerrel, Malik, Jitendra

arXiv.org Machine LearningApr-7-2020

Dexterous manipulation has been a long-standing challenge in robotics. Recently, modern model-free RL has demonstrated impressive results on a number of problems. However, complex domains like dexterous manipulation remain a challenge for RL due to the poor sample complexity. To address this, current approaches employ expert demonstrations in the form of state-action pairs, which are difficult to obtain for real-world settings such as learning from videos. In this work, we move toward a more realistic setting and explore state-only imitation learning. To tackle this setting, we train an inverse dynamics model and use it to predict actions for state-only demonstrations. The inverse dynamics model and the policy are trained jointly. Our method performs on par with state-action approaches and considerably outperforms RL alone. By not relying on expert actions, we are able to learn from demonstrations with different dynamics, morphologies, and objects.

demonstration, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2004.0465

Country:

North America > United States > New York (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)

Add feedback

Reinforcement learning's foundational flaw

#artificialintelligenceJan-11-2019, 17:21:00 GMT

In this essay, we are going to address the limitations of one of the core fields of AI. In the process, we will encounter a fun allegory, a set of methods of incorporating prior knowledge and instruction into deep learning, and a radical conclusion.[1] The first part, which you're reading right now, will set up what RL is and why it (or at least a particular version of it we shall name'pure RL' and soon define) is fundamentally flawed. It will contain some explanation that can be skipped by AI practitioners -- but be sure to stick around for the discussion of recent non pure-RL work we shall argue represents the fix to pure RL's foundational flaw. But for now, let us start with a fun allegory.

machine learning, pure rl, reinforcement learning, (15 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback